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Abstract 

Confidence intervals are assessed according to two criteria, namely expected length 
and coverage probability. In an attempt to apply the decision-theoretic method to 
finding a good confidence interval, a loss function that is a linear combination of the 
interval length and the indicator function that the interval includes the parameter 
of interest has been proposed. We consider the particular case that the parameter 
of interest is the normal mean, when the variance is unknown. Casella, Hwang 
and Robert, Statistica Sinica, 1993, have shown that this loss function, combined 
with the standard noninformative prior, leads to a generalized Bayes rule that is a 
confidence interval for this parameter which has "paradoxical behaviour" . We show 
that a simple modification of this loss function, combined with the same prior, leads 
to a generalized Bayes rule that is the usual confidence interval i.e. the "paradoxical 
behaviour" is removed. 
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1. Introduction 

Suppose that the random vector X has pmf or pdf f(x\9), where x G X and 
9 G fig. Also suppose that either (a) if> — 9 is the parameter of interest or (b) 
9 T = (ip T ,r T ) with ip the parameter of interest and Vlg = x ^V- The decision- 
theoretic approach to finding a good point estimator 5(X) of if) may be described as 
follows. Define the loss function L(9,d) for the value d of the estimate of if), when the 
true parameter value is 9. Then define the risk function R(9,5) = Eg(L(9,5(X))) , 
where Eg denotes the expectation according to the pmf or pdf f(x\9) of X. Choose 
a prior pdf n (possibly improper) such that minimizing the posterior expected loss, 
with respect to S(x) for each x G X, yields a good (generalized) Bayes rule estimator. 
Conditions for admissibility and for minimaxity of this estimator are well-known (see 
e.g. Berger, 1985, Lehmann and Casella, 1998 and Robert, 1994). 

Finding a good set estimator C(X) of if) is much more difficult than finding 
a good point estimator of if). This is because a confidence set C(X) is assessed 
according two criteria, namely expected volume and coverage probability. We now 
have two loss functions and the decision-theoretic approach does not apply directly. 
An attempt to apply the decision-theoretic approach is to define the following loss 
function, which is a linear combination of the interval length and the indicator 
function that the interval includes if): 

L(9, C) = vol(C) — kT(if> G C), (1) 

where k > and 

1(A) = I 1 ifA istme 
1 if A is false 

for any statement A. This leads to the risk function 

R(9,C) = Eg(L(0,C(X))) = Eg(vo\(C(X))) - kP e (if) G C(X)), 

where Pg denotes the probability according to the pmf or pdf f(x\9) of X. One 
then seeks k and prior pdf n such that minimizing the posterior expected loss, with 
respect to C(x) for each x G X, yields a good confidence set C(X) for if). 

However, as pointed out by Casella and Berger (1990) and Casella, Hwang and 
Robert (1993), this procedure may lead to very poor confidence sets (confidence sets 
with "paradoxical behaviour"). For the remainder of the introduction and in Section 
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2, we consider the case that X = (X\, . . . , X n ) where X\, . . . , X n are iid N(fi, a 2 ) 
with fi and a 2 unknown, 9 = (/i, a 2 ) and the parameter of interest is fi. For this 
case, Casella, Hwang and Robert (1993) show that, for the standard noninformative 
prior pdf 7r(#) = 1/a 2 for 9, the generalized Bayes rule is a very poor confidence 
interval. These authors show, however, that the use of the more general class of loss 
function 

L m (9,C) = m(length(C)) -X(ji G C), 

where m is an appropriately-chosen nonlinear and nondecreasing function, can solve 
this problem. 

In Section 2, we consider the following simple modification of the loss function 

©: 

L(9,C) = ^^l-kX(f,eC). (2) 
a 

We show that the standard noninformative prior pdf for 9 leads to a generalized 
Bayes rule that is the usual confidence interval for fi. In other words, the "paradox- 
ical behaviour" is removed. However, as discussed in Section 3, we do not advocate 
the use of generalizations of the loss function fl2]) in other contexts. 

2. Confidence intervals for the normal mean obtained by using the new 
loss function (J2]) 

Suppose that X±, . . . ,X n are iid N(fi,a 2 ) where both /i and a 2 are unknown 
(/i G R, o 2 G (0, oo)). Let 9 = (/i, a 2 ) and suppose that fi is the parameter of 
interest. Also let X = ^2™ =1 Xi/n and S = \fY^i=ii.Xi — X) 2 / (n — 1). Define the 
quantile t{m) by the requirement that P{ — t(m) < T < t(m)) = 1 — a for T ~ t m . 
The usual 1 — a confidence interval for fx is [X — t(n — l)S/\fn, X + t(n — 1)S/ y/n] . 
Suppose that 9 has the improper prior pdf it{9) = 1/a 2 . This is the standard 
noninformative prior pdf for 9. Use the new loss function L(9,C), given by ()2]). In 
this section, we prove that the generalized Bayes rule is, for the appropriate choice 
of k, the usual 1 — a confidence interval for /i. 

Since (X, S 2 ) is a sufficient statistic for 9, we consider confidence intervals for fi 
of the form C(X, S) = [£(X, S),u(X, S)] . Define the posterior expected loss 

E(L(9, C(X, S)) \x,s) = E(L(0, C(x, x)) \x,s), 



3 



where E(-\x,s) denotes the expectation according to the posterior distribution of 
9 i.e. the distribution of 9 conditional on (X,S) = (x,s). The posterior expected 
loss is equal to 

(u(x, s) — £(x, s)) E(l/a \x,s) — k P{ji E C(x, s) \ x,s), (3) 

where P(- \x, s) denotes the probability according to the posterior distribution of 9. 
As is well-known (see e.g. p. 215 of Robert, 1994), the marginal posterior distribution 
of \i is such that 



y/n(p - x) 
~ t n _l. 



Thus 



P(Ai e C(x, s) I x,s)=P ^ffly)-^) < T < , 

where T ~ i n -i- As is well-known (see e.g. Box and Tiao, 1973), the marginal 
posterior pdf of a is 



c(ra, s)<T n exp 



(n — l)s 
2a 2 



for a > 0, where 



1 ( n — 1 \ \ / ( // - J ).s- x 



2 V 2 / / V 2 



Hence 

(n - l)s 2 



E(l/a\x,s) = c(n,s) (j- (n+1) exp J^- 
= ci(n)/s, 



2(T 2 



where 

r(n/2) 



ci(n) 



r((n- l)/2) V n-V 
by (A2.1.4) on p. 145 of Box and Tiao (1973). Thus the posterior expected loss 

is equal to 

Ci(w)(m(x,s) -£(x,s)) _ kp f Vn(£(x,s) - x) < < y/n(u(x,s) - x) 
s \ s ~ ~ s 

Let (i*(x, s),u*(x, s)) denote the value of (i(x, s), u(x, s)j minimizing the posterior 
expected loss, subject to u(x, s) > £(x, s). We find this minimizing value as follows. 
Define the following function of (q,r): 

r -^-k x {n) P (^<T<^L\, ( 4 ) 



where k\{n) = kjc\[n). Let {q*,r*) denote the value of (q, r) minimizing fll]), subject 
to r > q. Then set £*(x, s) = x + q* and u*(x, s) = x + r*. Let h = (r — q)/2 and 
suppose that r > q, so that h > 0. Thus (0J is equal to 

2 A. kl(n) p(^l<T<^ { " + 2h) ). (5) 
s \ s s ) 

We minimize this with respect to (q, h), where h > 0, in two steps as follows. In the 
first step, we minimize §5§ with respect to q for fixed h > 0. We then substitute this 
minimizing value of q into fl5]) and minimize the resulting expression with respect to 
h > 0. For fixed h > 0, we minimize ([5]) with respect to g by maximizing 

P[q<^=<q + 2h 

'n 



with respect to q. Clearly, this is maximized by setting q = —h. Substituting this 
value of q into fl5]), we obtain the following function of h: 



2h 

s 



-Hn) (2^(^-1) 



(6) 



where F n _i denotes the t n -\ cdf. Multiplying ([6]) by y/n, we obtain 



^-y.j^f*-!], (7 ) 



where k2(n) = y/nk\{n). Minimization of fl6]) with respect to h > is equivalent to 
minimization of ([7]) with respect to h > 0, and this is equivalent to minimizing 



s 

with respect to h > 0. Set 

1 ci(n) 



f n -i(t(n - 1)) ' 

where / n _i denotes the t n _i pdf. Thus /c2(^) = ^-1 fn-iit{n — 1)). Our aim, therefore, 
is to minimize 



^Jnh 



S fn- 

with respect to h > 0. Now 
cf#(/i) 



dh 



Vn 1 , / y/nh \ y/n 

s fn-i{t{n - 1)) n_1 V s ) a 



s V fn-Mn-l)))- 
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This derivative is an increasing function of h > and takes a negative value for 
h = 0. Therefore, g(h) is minimized with respect to h by setting y/nh/s = t(n — 1), 
so that h = t(n — l)s/y/n and C(x, s) = \x — t{n — l)s/y/n, x + t{n — l)s/y/nj , the 
usual 1 — a confidence interval for /i. 

3. Discussion 

The loss function ([2]) can be generalized in the obvious way to other contexts 
where there is a scaling parameter (analogous to a). However, we do not advocate 
the use of such a loss function. The expected volume and coverage probability of a 
confidence set are very different criteria. An attempt to shoehorn these criteria into 
a single risk function that is a linear combination of these criteria does not seem 
to be the appropriate strategy. One is better off to solve the problem of finding a 
confidence set that minimizes a weighted average (over the parameter space VIq) of 
the expected length, subject to the constraint that this confidence set has coverage 
probability that never falls below the specified value 1 — a. In the case that 9 
is a scalar and the parameter of interest, an ingenious solution to this problem is 
provided by Pratt (1961). Farchione and Kabaila (2008), Kabaila and Giri (2009ab) 
solve this problem in particular settings by computational means. 
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